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APPARATUS AND METHOD FOR 
COMPRESSING VIDEO INFORMATION 

5 CROSS-REFERENCE TO RELATED APPLICATIONS 

This application claims priority to provisional application Serial No. 
60/066,638, filed November 14, 1997, which is herein incorporated by reference. 

BACKGROUND OF THE INVENTION 

1. Field of the Invention 

10 The present invention relates generally to apparatus and methods for encoding 

and decoding video information. More particularly, the present invention relates to an 
apparatus and method for motion estimation and motion prediction in the transform 
domain. 

2. Background of the Related Art 

15 Due to the limited bandwidth available on transmission channels, only a limited 

number of bits are available to encode audio and video information. Video encoding 
techniques attempt to encode video information with as few bits as possible, while still 
maintaining the image quality required for a given application. Thus, video 
compression techniques attempt to reduce the bandwidth required to transmit a video 

20 signal by removing redundant information and representing the remaining information 
with a minimum number of bits, from which an approximation to the original image 
can be reconstructed, with a minimal loss of important features. In this manner, the 
compressed data can be stored or transmitted in a more efficient manner than the 
original image data. 

25 There are a number of video encoding techniques which improve coding 

efficiency by removing statistical redundancy from video signals. Many standard 
image compression schemes are based on block transforms of the input image such as 
the Discrete Cosine Transform (DCT). The well-known MPEG video encoding 
technique, for example, developed by the Motion Pictures Experts Group, achieves 

30 significant bit rate reductions by taking advantage of the correlation between pixels 
(pels) in the spatial domain (through the use of the DCT), and the correlation between 
image frames in the time domain (through the use of prediction and motion 
compensation). 

In well-known orthogonal and bi-orthogonal (subband) transform based 
35 encoding systems (inclusive of lapped orthogonal transforms), an image is transformed 
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without the necessity of first blocking the image. Transform encoders based on DCT 
block the image primarily for two reasons: 1) experience has shown that the DCT is a 
good approximation to the known optimal transform (Kahunen-Luove') on 8 x 8 
regions of the image or a sequence of difference images; and 2) the processing of DCT 
5 grows 0(N log N ) and through the blocking of the image, computational effort is 
limited. 

The end result is that DCT based approaches, unless otherwise enhanced, have 
basis functions which are compactly supported by (or zero outside of) an 8 x 8 region of 
an image. The orthogonal and bi-orthogonal transforms under consideration have basis 

10 members which are predominately supported in a finite interval of the image, but share 
extent with neighboring spatial regions. Subband image encoding techniques, for 
example, divide an input image into a plurality of spatial frequency bands, using a set 
of filters and then quantize each band or channel. For a detailed discussion of subband 
image encoding techniques see Subband Video Coding With Dynamic Bit Allocation 

15 and Geometric Vector Quantization , C. Podilchuck & A. Jacquin, SPIE Vol. 1666 

Human Vision, Visual Processing, and Digital Display III, pp. 241-52 (Feb. 1992). At 
each stage of the subband encoding process, the signal is split into a low pass 
approximation of the image, and a high pass term representing the detail lost by making 
the approximation. 

20 In addition, DCT based transform encoders are translation invariant in the sense 

that the base members have a support which extends over the entire 8x8 block. This 
prevents motion compensation from being done efficiently in the transform domain. 
Therefore, most of the motion compensation techniques in use utilize temporally 
adjacent image frames to form an error term which is then transform coded on an 8 x 8 

25 block. As a consequence, these techniques require an inverse transform to be carried 
out to supply a reference frame from the frequency domain to the time domain. 
Examples of such systems are found in U.S. Patent No. 5,481,553 to Suzuki et al and 
U.S. Patent No. 5,025,482 to Murakami et al. 

FIG. 1 illustrates a simplified block diagram of a prior art standard video 

30 compression approach using DCT. In block 10, the changes in the image sequence are 
efficiently represented through motion detection techniques such as one technique used 
in MPEG when in predictive mode. In particular, a previous frame is used as a 
reference frame and a subsequent frame, in forward prediction, is compared against the 
previous frame to eliminate temporal redundancies and rank the differences between 

35 them according to degree. This step sets the stage for motion prediction of the 

subsequent frame and also reduces the data size of the subsequent frame. In block 12, a 
determination is made as to which parts of the image have moved. Continuing with the 
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MPEG example, using the data set provided by block 1 0, interframe motion prediction 
is carried out by applying motion compensation techniques to the reference frame and 
subsequent frame. The resulting prediction is subtracted from the subsequent frame to 
generate a prediction error/frame. Thereafter, in block 14, the changes are converted to 
5 features. In MPEG, this is done by compressing the prediction error using a 2- 
dimensional 8x8 DCT. 

Most video compression techniques based on DCT or subband encoders have 
focused on high precision techniques that attempt to encode video information without 
a loss of accuracy in the transform stage. Such high precision encoding techniques, 
10 however, rely on relatively expensive microprocessors, such as Intel Corporation's 
PENTIUM® processor, which have dedicated hardware to aid in the manipulation of 
floating point arithmetic and thereby reduce the penalty for maintaining a high degree 
of precision. 

For many applications, however, such relatively expensive hardware is not 
15 practical or justified. Thus, a lower cost implementation, which also maintains 

acceptable image quality levels, is required. Known limited precision transforms that 
may be implemented on lower-cost hardware, however, tend to exhibit reduced 
accuracy as a result of the "lossy" nature of the encoding process. As used herein, a 
"lossy" system refers to a system that loses precision through the various stages of the 
20 encoder and thereby lacks the ability to substantially reconstruct the input from the 
transform coefficients when decoding. The inability to compensate for the reduced 
accuracy exhibited by these low precision transforms have been an impediment to the 
use of such transforms. 

In view of the foregoing, there is a need for a video encoder that performs the 
25 motion compensation in the transform domain, thereby eliminating the requirement of 
an inverse transform in the encoder and enabling a simple control structure for software 
and hardware devices. There is also a need in the art for a video encoder having a class 
of transforms which are suitable for low precision implementation, including a control 
structure which enables low cost hardware and high speed software devices. 

30 

SUMMARY OF THE INVENTION 

The subject invention is directed to a novel and unique apparatus and method for 
compressing data. More particularly, the present apparatus and method are adapted and 
configured to more efficiently encode data representing, for example, a video image, thereby 
35 reducing the amount of data that must be transferred to a decoder. 

The invention concems a method of compressing data that includes a first data set and a 
second data set. The method includes transforming the first and second data sets into 
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corresponding first and second transform coefficient sets. Thereafter, data is generated which 
represents differences between the first and second transform coefficient sets. The generated 
data is then encoded for transmission to the decoder. 

Transforming the first and second data sets may be performed utiHzing a tensor product 
5 wavelet transform. Further, the remainders resulting from the transforming process may be 
transmitted from one subband to another subband. 

The data representing differences between the first and second transform coefficient 
sets is generated by estimating the differences between the first and second transform 
coefficient sets to provide motion vectors. The motion vectors are applied to the first transform 

10 coefficient set to produce a prediction of the second transform coefficient set. The prediction is 
subtracted from the second transform coefficient set resulting in a set of prediction errors. The 
first and second transform coefficient sets can be error corrected to ensure synchronization 
between the encoder and the decoder. 

In estimating the differences between the first and second transform coefficient sets a 

15 search region is generated about a subset of the transform coefficients from one of the first and 
the second transform coefficient sets. Thereafter, a related subset of transform coefficients is 
applied from the other of the first and the second transform coefficient sets to the search region. 
Then, the related subset of transform coefficients is traversed incrementally within the search 
region to a position representing a best incremental match. The related subset can then be 

20 traversed fractionally within the search region to a position representing a best fractional 
match. 

Another embodiment of the method of compressing data that includes a first data set 
and a second data set includes transforming the first and second data sets into corresponding 
first and second collections of subbands. Then, generating data representing the differences 

25 between the first and second collections of subbands. The data may be generated, for example, 
by carrying out a motion compensation technique. The motion compensation technique may 
provide output such as motion vectors and prediction errors. Thereafter, the generated data is 
encoded for transmission to the decoder. 

An embodiment may also the second collection of subbands macro-block packed to 

30 form a subband macro-block grouping. Thereafter, the generated data may be obtained through 
a motion compensation technique as follows. The differences between the first collection of 
subbands and the subband macro-block grouping is estimated to provide motion vectors. The 
motion vectors are applied to the first collection of subbands producing a prediction of the 
second collection of subbands. The prediction is then subtracted fi*om the second collection of 

35 subbands resulting in a set of prediction errors. 

The differences can be estimated between the first collection of subbands and the 
subband macro-block grouping as follows. A search region is generated about a subset of 
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transform coefficients from the first collection of subbands. A related subset of transform 
coefficients from the subband macro-block grouping is applied to the search region. The 
related subset of transform coefficients is then traversed incrementally within the search region 
to a position representing a best incremental match. Then, the related subset of transform 
coefficients is traversed fractionally within the search region to a position representing a best 
fractional match. 

A subband macro-block packing method is also disclosed for organizing subband 
blocks of a collection of subbands derived from a transform of an image. The method includes 
disassociating a set of related subband blocks from a collection of subbands that correspond to 
an image macro-block in the image. The set of related subband blocks are packed together as a 
subband macro-block. The steps of the disassociating and packing related subband blocks are 
repeated for each set of related subband blocks in the collection of subbands to form a subband 
macro-block grouping. 

The method for macro-block packing may be further refined by arranging the set of 
related subband blocks within the subband macro-block in the same relative position the 
subband blocks occupy in the collection of subbands. The method may also include locating 
the subband macro-block within the subband macro-block grouping in the same spatial 
location as the corresponding image macro-block is located within the image macro-block 
grouping. 

After macro-block packing, changes can be detected between the first subband macro- 
block grouping (reference) and a subsequent second subband macro-block grouping. Detecting 
is based on a distortion evaluation according to a general equation of the form: 

i 

where: = measurement of distortion relative to reference R; 
— applied weight; 
G = transform coefficients of the second subband macro- 
block grouping; and 
R = reference (e.g., first subband macro-block grouping), 

A more specific form of the equation for evaluating distortion is of the form: 

e,=W,\\G-Rg+W,\\G-Rl 

Another embodiment of the present invention is described as a finite precision method 
for transforming a data set into transform coefficients wherein the data set is transformed 
utilizing a tensor product wavelet pair and the remainders emanating therefrom are propagated 
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to the opposite filter path. More particularly, the embodiment may include determining a low 
pass component and a high pass component of an image. The low pass component is 
normalized to generate a low pass normalized output and a first remainder (rl). Likewise, the 
high pass component is normalized to generate a high pass normalized output and a second 
remainder (rh). A first operation (g(rl,rh)) is performed on the first and second remainders (rl, 
rh) and added to the results emanating therefrom to the approximation. And, a second 
operation (f(rl,rh)) is also performed on the first and second remainders (rl, rh) and added to the 
results emanating therefrom to the detail. It is important to note that the propagation of the 
remainders (propagation of errors) can be used in any transform, not just the tensor product. 

The above finite precision method results in an overcomplete representation of an 
image. The method may include downsampling, for example, by two (2), of the high and low 
pass components to obtain the necessary and sufficient transform coefficients representing the 
image in the transform domain. 

An embodiment of the finite precision method includes a low pass filter having the 
value -1, 2, 6, 2, -1 and a high pass filter having the value -1, 2, -1 . The first operation 
(g(rl,rh)) and the second operation (f(rl,rh)) have the functions: 



g(rl,rh) = rh; and 

f(rl,rh) = floor(rh + 'A), where nh = 14. 



A particular example of a tensor product wavelet transform including the above has the 



20 form: 



; and 



2 1+1 



D + A.. +2 



25 



^2i-l 



where: = input data; 

data that precedes input data 

data that follows input data ^^2,5 

detail term (decimated high pass filter output); 

Z>,+i = detail term that follows detail term Z),.; and 

~ approximation term (decimated low pass filter 
output). 
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Also disclosed is an encoder apparatus for predicting changes between a sequence of 
frames in the transform domain. The apparatus includes a transformation device, having an 
input configured to receive a first and second frame of the sequence of frames, and further 
configured to generate therefrom a corresponding first and second collection of subbands that 
5 each support a set of transform coefficients. A motion compensation device, having an input 
coupled to the transformation device, is configured to receive the first and second collection of 
subbands, and further configured to efficiently represent differences between the first and 
second collection of subbands. Also included is a difference block having an input coupled to 
the transformation device and an input coupled to the output of the motion compensation 

10 device. The input received from the motion compensation device is subtracted from the second 
collection of subbands in the difference block, thereby generating a prediction errors. 

The motion compensation device includes a motion estimation device configured to 
compare the first and second collection of subbands. A collection of motion vectors is 
generated therefrom which approximately represent the differences between the first and 

15 second collections of subbands. The motion compensation device also includes a motion 
prediction device, having an input coupled to the motion estimation device, configured to 
receive the motion vectors and the first collection of subbands, and further configured to 
generate therefrom a prediction grouping representing a prediction of the second collection of 
subbands. The prediction of the second collection of subbands is subtracted from the second 

20 collection of subbands in a difference block resulting in prediction errors. 

A finite precision transforming apparatus is also disclosed for transforming an image 
frame into the transform domain. The apparatus includes a low pass component and a high 
pass component arranged in parallel and sharing an input that is configured to receive the 
image frame. A low pass normalizing device is included which has an input configured to 

25 receive the output of the low pass component and is further configured to produce a low pass 
normalized output and a first remainder (rl). A high pass normalizing device has an input 
configured to receive the output of the high pass component and is fiirther configured to 
produce a high pass normalized output and a second remainder (rh). A first operation device 
has an input configured to receive the first remainder (rl) and the second remainder (rh) and 

30 further configured to calculate a first calculation (g(rl,rh)) thereby generating a first calculation 
result. A second operation device has an input configured to receive the first remainder (rl) and 
the second remainder (rh) and configured to calculate a second calculation (f(rl,rh)) thereby 
generating a second calculation result. In addition, a first adder has an input configured to 
receive the low pass normalized output and the first calculation result, the first adder 

35 generating a subband approximation. Similarly, a second adder has an input configured to 
receive the high pass normalized output and the second calculation result, the second adder 
generating a subband detail. 
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The finite precision transforming apparatus further includes a first downsampler 
at the low pass output and a second downsampler at the high pass output. A 
downsampling of two (2) provides sufficient and necessary transform coefficients to 
reconstruct the input image in the decoder. 
5 These and other unique features of the apparatus and method disclosed herein 

will become more readily apparent from the following detailed description taken in 
conjunction with the drawings. 

10 BRIEF DESCRIPTION OF THE DRAWINGS 

Representative embodiments of the present invention will be described with 
reference to the following figures: 

FIG. 1 is a schematic block diagram of a prior art standard video compression 
approach using the Discrete Cosine Transform (DCT) wherein motion compensation is 
15 carried out in the image domain; 

FIG. 2 is a schematic block diagram illustrating a general arrangement of an 
embodiment of the present invention including provisions for motion compensation to 
be carried out in the transform domain; 

FIG. 3 is a schematic block diagram of a more detailed arrangement of the 
20 embodiment illustrated in FIG. 2; 

FIG. 4(a) illustrates a QCIF image having image macro-blocks (IMB^^x) 
through 8,10 and FIG. 4(b) illustrates a subband representation of the QCIF image after 
the image frame has been transformed by a forward wavelet transform; 

FIG. 5(a) illustrates the subband representation of the QCIF image as illustrated 
25 in FIG. 4(b), FIG. 5(b) illustrates a collection of subband macro-blocks (SMBx^x) 
generated from the subband representation illustrated in FIG. 5(a), and FIG. 5(c) 
illustrates the organization of the subband macro-blocks of FIG. 5(b) so that the 
subband macro-blocks (SMBx^x) correspond spatially with their related image macro- 
blocks (IMBx^x) of FIG. 4(a); 
30 FIGS. 6(a) and 6(b) are schematic block diagrams illustrating filter banks for 

transforming and decimating an input image and their respective vertical and horizontal 
subbands created from each filter bank; 

FIG. 7 illustrates an architecture for transferring the finite precision arithmetic 
in the filter banks from the high band region to the low band region and, conversely, 
35 ft-om the low band region to the high band region; 

FIG. 8 illustrates search regions in the transform domain for each subband 
(SBy) corresponding to image macro-block 2,4 (IMB2^4) in the image domain wherein 
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the search band is P x P pels, and further details the search region for that of SBqo 
when the input image size is in QCIF; 

FIGS. 9(a) through 9(d) illustrate a method by which motion is estimated in the 
transform domain; 

5 FIG. 10 illustrates a method by which motion is predicted in the transform 

domain; 

FIG. 1 1 is a schematic block diagram illustrating another detailed arrangement 
of the embodiment illustrated in FIG. 2; 

FIG. 12 is a schematic block diagram illustrating another detailed embodiment 
10 of the invention wherein motion estimation is carried out in the image domain and 
motion prediction is carried out in the transform domain; 

FIG. 1 3 illustrates a P x P pels search region when searching in the image 
domain about image macro-block 2,4 (IMB2^4) when the input size is QCIF; and 

FIG. 14 is a schematic block diagram illustrating another detailed embodiment 
15 of the invention wherein motion estimation and motion prediction is carried out in the 
image domain. 



DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS 

20 An embodiment of the present invention provides an apparatus and method for 

compressing digital video signals using a limited precision transformation technique. 
The embodiment improves on conventional loss-less or lossy transform based 
techniques by motion compensating, e.g., estimating and predicting motion, in the 

transform domain, rather than in the time domain as in the prior art. In this manner, 

/ 

25 improved image quality can be achieved on less expensive hardware. 

The term "motion compensation" is intended to be defined in its broadest sense. 
In other words, although motion compensation is often described and is illustrated 
herein as including motion estimation and motion prediction of a group of picture 
elements, it should also be understood to encompass, for example, rotation and scale. 

30 In addition, the term "motion compensation" may include, for example, simply 
generating data representing differences between two sets of data. 

Compression efficiencies are gained by both converting the image to features 
and mapping the features first. The disclosure herein is illustrated as it relates to a 
sequence of images or video frames. Such an image sequence can be readily 

35 understood to be a collection of spatially oriented data elements (either scalar, vector, 
or functional) which are placed in arrangement with each other and are indexed by time 
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or some other parameter. An image sequence can be in Cartesian coordinates, but other 
coordinate systems in the art can be used. 

In addition, the present apparatus and method can be utilized in non-video 
applications such as speech, audio, and electrocardiogram compression. That is, even 
5 though the invention disclosed herein is illustrated on a two-dimensional system (2 D), 
i.e., video compression, it is intended that the teachings can be applied to any other 
dimensional systems so to advance the art of data compression in general. 

For example, the teachings can be applied to one and one-half dimensional 
systems (1-1/2 D) such as ultra-sound imaging. Also, the teachings can be applied to 
10 three dimensional systems (3 D) such as magnetic resonance imaging (MRI). 

Throughout the description below, the term "frame" refers to a single image of a 
sequence of images fed to an encoder, regardless of the form of the single image, i.e., 
regardless if it is in the time domain, the frequency domain, or of any other processing 
that has been done on it. In addition, the term "pel" is used in reference to a picture 
15 element in the time domain and the terms "coefficient" and "transform coefficient" are 
used in reference to representations of the pels which are generated after the pels have 
passed through, for example, a forward wavelet transform. These terms are used to 
facilitate the description of the embodiments and are in no way intended to restrict the 
scope of the invention. 

20 Referring now to the drawings wherein like reference numerals identify similar 

elements of the subject invention, there is illustrated in FIG. 2 a schematic block 
diagram of an embodiment for compressing a sequence of images or sequence of 
frames. This diagram is one of several embodiments disclosed herein. More detailed 
embodiments are discussed in the paragraphs that follow. 

25 In FIG. 2, an image is converted to a collection of features in the transform 

domain in block 20. The features that are determined to be significant for that image, 
i.e., those features that are determined to have significantly changed from a past or 
reference frame, are selected in block 22. The significant features are efficiently 
represented in block 24 and, thereafter, sent to a decoder to update the features in a 

30 reference frame. 

For example, the original image is transformed in block 20 and represented by a 
transform coefficient set. The transform coefficients of the coefficient set are then 
evaluated in block 22 to determine their significance via various weighting and 
evaluation techniques and ranked according the their significance. Thereafter, in block 

35 24, motion compensation between the present frame and past or reference frame takes 
place. Motion compensation may include motion estimating the change between 
frames to generate a set of motion vectors. Thereafter, the motion vectors are applied to 
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a reference frame during a motion prediction step. The results from motion prediction 
are subtracted from the transform coefficient set to determine the errors of that 
prediction. The prediction errors are then optionally scaled and finally positionally 
encoded along with the motion vectors for transmittal to the decoder. 
5 Referring to FIG. 3, a schematic block diagram illustrates a more particular 

arrangement of the embodiment that was described with reference to FIG. 2. An image 
sequence or series of video frames 26 encoded in, for example, Caltech Intermediate 
Format (GIF) are fed to a converter 28. A GIF frame has 288 x 352 pels. In converter 
28 the frames are converted to quarter GIF (QGIF), for example, QGIF image 30 as 

10 illustrated in FIG. 4a. A QGIF image has 144 x 176 pels. GIF is converted to QGIF by 
low-pass filtering and decimating by two (2) in both the horizontal and vertical 
directions. To facilitate processing, the 144 x 176 pels are divided into image macro- 
blocks (IMBx^x)' ^^^^ having 16x16 pels. QGIF is used herein as an example only 
and is not intended in any way as a limitation with respect to this invention. The 

15 techniques described below are readily adaptable to other image (and non-image) 
formats by methods well known to those skilled in the art. 

Referring to FIGS. 3 and 4, QGIF image 30 is fed to blocks 32 and 36, which 
make up block 20 of FIG. 2, wherein mapping of the images to features takes place. 
More specifically, QGIF image 30 (FIG. 4(a)) is fed to block 32 wherein a forward 

20 wavelet transform transforms each frame to a collection of subbands 34 (FIG. 4(b)). 

This organization of the transformed image, i.e., collection of subbands 34, is stored in 
memory for later use, for example, for motion estimation, motion prediction, and 
determining prediction error. An appropriate forward wavelet transform that may be 
used for this invention is discussed in greater detail herein below. 

25 The collection of subbands 34 are fed to block 36 for subband macro-block 

packing. During subband macro-block packing, the subband blocks that correspond to 
a particular image macro-block are organized to form subband macro-blocks (SMBx^x)* 
Thereafter, each subband macro-block resides at the spatial location of the image 
macro-block which it is related to and therefore represents. The collection of all 

30 subband macro-blocks for a particular frame is called a subband macro-block grouping 
40. 

FIG. 5 illustrates the process of subband macro-block packing. During subband 
macro-block packing, all related subband blocks in the collection of subbands 34 (FIG 
5(a)) are reorganized during subband macro-block packing to form subband macro- 
35 blocks 38 as illustrated in FIG 5(b). 

For example, the shaded subband blocks in FIG. 5(a), corresponding to image 
macro-block 2,4 (IMB2^4) in FIG. 4(a), are reorganized during subband macro-block 
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packing in block 36 (FIG. 3) to form subband macro-block SMB2^4 as illustrated FIG. 
5(b). The subband macro-blocks 38 (SMBq^q through SMBg jq) then organized 
into subband macro-block grouping 40 as illustrated in FIG. 5(c) such that each 
subband macro-block is supported by the spatial position of its corresponding image 
5 macro-block (IMBx^x) QCIF image 30. In this example, SMB2^4 is found to be 
significantly supported by the spatial location of IMB2^4 as illustrated in FIGS. 4(a) 
and 5(c). 

It is important to note again that although the embodiment described herein 
refers only to frame images represented in QCIF, those skilled in the art will readily 
10 understand that other formats may be used without deviating from the teachings of this 
invention. It is also important to note that the particular grouping of subband blocks in 
each subband macro-block is used to accommodate the particular wavelet illustrated. 
Other groupings of subband data exist which would be more appropriate for other 
wavelets. 

15 From the above descriptions of the collection of image macro-blocks 30 (FIG. 

4(a)), the collection of subbands 34 (FIG. 4(b)), and the subband macro-block grouping 
40 (FIG. 5(c)) it should be readily apparent that there is a correlation between certain 
image macro-blocks, subband blocks, and subband macro-blocks. An example of such 
a correlation is as follows: (a) image macro-block 2,4 (IMB2^4), which is shaded and 

20 also identified as image macro-block 106 in FIG. 4(a); (b) all of the shaded subband 
blocks in FIG. 4(b), for example, subband block 1 16 in subband 00 (SBqq) and 
subband block 1 18 in subband 33 (SB33); and (c) subband macro-block 2,4 (SMB2^4), 
which is shaded and also identified as subband macro-block 1 17 in FIG. 5(c). 
Descriptions in this specification that involve coefficients having a relationship such as 

25 that exemplified above may be referred to as being 'related*. 

Referring again to FIG. 3, subband macro-block grouping 40 is fed to blocks 42, 
46, 48, and 52, which make up block 22 found in FIG. 2, where it is determined which 
features or subband macro-blocks (SMBq^O through SMBg 10) have changed. In 
particular, subband macro-block grouping 40 is fed to block 42 wherein weights are 

30 applied to scale each subband macro-block in subband macro-block grouping 40 by an 
amount which equalizes the perceptual importance of the subband macro-block. The 
output of weighting block 42 is weighted grouping 44. 

Perceptual importance through weighting can be determined, for example, 
through a Mean Opinion Score study, or determined from weights used in other coding 

35 systems such as those found in H.261 and H.263 of the Consultative Committee for 
Intemational Telegraph and Telephone (CCITT), the standards of which are 
incorporated by reference herein. For a discussion of Mean Opinion Scoring see 
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Discrete Cosine Transform , K. R. Rao & P. Yip, Academic Press, Inc., pp. 165-74 
(1990), incorporated by reference herein. 

After weights have been appUed in block 42 to scale each subband macro-block, 
weighted grouping 44 is fed to and processed in change detect block 46 to determine 
5 the relative amount of change that has occurred. This change is also termed the 
^significance' or, for the purpose of video, the distortion of weighted grouping 44. 
Significance can be determined in relation to a given reference such as, for example, 
zero (0) or a past weighted grouping. The loop extending from change-detect block 46 
includes a frame delay 48 which returns a past weighted grouping to change-detect 

10 block 46 for use as a reference. The output of change-detect block 46 is change 
detected grouping 50. 

A zero (0) reference is used in change detect block 46, for example, when 
initially transmitting frames through the encoder. In this case, the entire frame is 
referenced to zero (0). This is also known as m/raframe referencing. As described 

15 above, a past weighted grouping can also be used wherein the macro-block grouping is 
weighted in block 42 as described above and thereafter delayed in delay block 48 of 
change-detect block 46 for use as a reference. This later method, also known as 
interframe referencing, eliminates repeatedly sending redundant and/or unimportant 
information to the decoder. 

20 An alternative use of zero (0) frame referencing is for reproducing and 

maintaining a relatively accurate reference image at the decoder during system 
operation. One method employs periodically applying a zero (0) reference to the 
entirety of every eighth (8th) frame of the standard 30 frames/second. Alternatively, 
the image can be stoichastically refreshed such as by randomly, or methodically, 

25 referencing subband blocks to zero (0). To facilitate any process that references all or a 
part of a frame to zero (0), the zero-referenced subband blocks are identified as such so 
to prevent motion compensation operations (described below) to be performed on the 
effected blocks. Thus, the identified subband blocks are reproduced in whole at the 
decoder for refreshing either the entire reference or a part of the reference therein, as the 

30 case may be. 

Referring again to FIG. 3, the collection of subbands 34 that were earlier stored 
in memory and the subband macro-blocks of change-detected grouping 50 are ranked in 
block 52 according to the amount each subband block is determined to have changed, 
i.e., in accord with their significance. Ranking is based on the values previously 
35 assigned by weighting and detecting the subband macro-blocks in blocks 42 and 46, 
respectively. The output of block 52 includes a ranked subband grouping 53 and a 
ranked subband macro-block grouping 54 which are transmitted over line 55. 
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With continued reference to FIG. 3, ranked subband grouping 53 and ranked 
subband macro-block grouping 54 are selectively fed to blocks 56, 60, 62, 68, 72, and 
76, which correspond to block 24 of FIG. 2, wherein the changed macro-blocks are 
efficiently represented. In particular, ranked subband macro-block grouping 54 (the 
5 'present* frame) is fed to block 56 for motion estimation. Ranked subband grouping 53 
is fed to delay block 62, thereafter providing a delayed ranked subband grouping 57 
(the 'reference' frame) to line 64 for motion estimation and motion prediction in blocks 
56 and 60, respectively. A collection of motion vectors 58 are generated in motion 
estimation block 56, in a marmer to be described below, and fed to block 60 for motion 

10 prediction and also sent to block 76 for positional encoding. 

The motion vectors 58 sent to motion prediction block 60 are used to alter 
delayed ranked subband grouping 57 so to generate a predicted grouping 66. A 
difference block 68 receives ranked subband grouping 53 and subtracts predicted 
grouping 66 therefrom, resulting in grouping differences 70, i.e., the prediction error. 

15 The grouping differences 70 are further scaled in block 72 resulting in scaled grouping 
differences 74, Those skilled in the art will recognize that the fewer the number of non- 
zero grouping differences 70, the more accurate the collection of motion vectors 58 
have predicted the changes between the present frame and the reference frame. And, 
the fewer the differences the less bits that must be transmitted to the decoder to correct 

20 for deficiencies in motion estimation. 

The scaled grouping differences 74 from scaling block 72 and the collection of 
motion vectors 58 from motion estimation block 56 are positionally encoded as macro- 
blocks in block 76. Therein, the data is efficiently organized into a bit stream. 
Encoded bit stream grouping 78 is output from block 76 and transmitted via 

25 transmission line 80 to a decoder 82 for inverse processing. Transmission can be 
through a variety of mediums, for example, electronic, electromagnetic, or optical. 

Regarding bit stream formatting, there are several standard methods well known 
in the art for formatting bit streams. The format used in an H.263 based encoder system 
is one example. A bit stream is basically a serial string of bit packets. Each packet 

30 representing a particular category of data. 

For example, bit packets may include system level data, video, control, and 
audio data. As data is received for positional encoding in block 76, it is organized into 
bit packets in accordance with the format in use. Generally, a collection of bit packets 
representing a video frame starts with a bit identifying it as a new frame. The amount 

35 of quantization and other control codes typically follow. Thereafter there is encoded a 
list of macro-blocks representing the scaled grouping differences 74. For QCIF, the 
number of macro-blocks equals ninety-nine (99). (See FIG. 5(c).) 
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To facilitate more efficient transfer of data, each macro-block is proceeded by a 
macro-block zero bit (MBZero-bit) which indicates the presence or absence of non-zero 
data in a macro-block. If the macro-block is present, control information for the macro- 
block, including the related collection of motion vectors 58, is sent followed by the 
5 subband data, i.e., the related scaled grouping differences 74. Including such 

information substantially reduces the number of bits that are sent over transmission line 
80 in that the absence of a macro-block is represented by a single symbol instead of the 
all of the bits that would be necessary to identify the entire string of macro-block 
coefficients that are equal to zero. 

10 Another situation wherein further efficiencies can be had is when only some of 

the subband blocks within a subband macro-block are zero. An embodiment includes 
the step of flagging the subbands whose coefficients are equal to zero with a subband 
zero flag (SBZero flag). A subband from scaled grouping differences 74 whose 
coefficients are zero indicates that no changes were found to exist between 

15 corresponding subband blocks of ranked subband grouping 53 and predicted grouping 
66. It takes substantially fewer bits to represent SBZero flags than to separately 
represent each coefficient equaling zero. Of course, the decoder is programmed to 
recognize both the MBZero-bit and the SBZero flag so to interpret the symbol 
introduced during positional encoding in block 76. An example of zero-runs length 

20 codes for symbolizing strings of zeros follows. 



Zero-Runs Length Codes 



Zero Code 


Number of Consecutive Zeros 


01 


single zero 


001 bo 


2,3 


0001 bi bo 


4,5,6,7 


00001 b2 bi bo 


8,9,10,11,12,13,14,15 


00000 1 bs b2 bi bo 


16... 31 


zeros{\og^ (^))«MSB - 1'' MSB - 2 ' " ' "l"o 


For general N 



With continued reference to FIG. 3, encoded bit stream grouping 78 is received 
by decoder 82 via transmission line 80 and is fed to a positional decoding block 86 
25 which reverses the effect of positional encoding block 76. The collection of motion 
vectors 58 are extracted from bit stream grouping 78 and fed to a prediction block 98, 
The decoded scaled grouping differences 88, in subband form (FIG. 4(b)), are provided 
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to a quantum recovery block 90. In quantum recovery block 90, past transform 
coefficients, and past and present dequantization terms are used to recover the 
quantized transform coefficients' values, i.e., they are used to recreate the grouping 
differences 70. 

5 A collection of subbands 92, the encoder's reference frame, is fed to a delay 

block 94. A delayed collection of subbands 96 is fed from the delay block 94 to a 
prediction block 98. Similar to the process carried out in motion prediction block 60 of 
the encoder, the collection of motion vectors 58 are applied to the delayed collection of 
subbands 96 in prediction block 98. Therein, the delayed collection of subbands 96 is 

10 altered to generate a predicted grouping 100, i.e., a subband representation of the 

updated image not including the grouping differences 70. Grouping differences 70 and 
predicted grouping 100 are added in an adder block 102 generating the collection of 
subbands 92, i.e., a new reference frame. Finally, an inverse wavelet transform is 
performed in block 104 on the collection of subbands 92. This step is essentially the 

15 reverse of the forward wavelet transform 32 that was briefly described above and which 
will be described in greater detail herein below. The resulting output from block 104 is 
a reconstructed image 105. 

As previously described and illustrated in FIGS. 3 and 4, QCIF image 30 (FIG. 
4(a)) is fed to forward wavelet transform 32 which transforms each video frame to form 

20 the collection of subbands 34 (FIG. 4(b)). An embodiment of transform block 32 

utilizes a tensor product wavelet transform. For a detailed discussion of tensor product 
wavelet transforms see Standard Wavelet Basis Compression of Images , Joel Rosiene 
and Ian Greenshields, Optical Engineering, Vol. 33, Number 8 (August 1994), 
incorporated by reference herein. Other finite precision transforms may be utilized 

25 such as the well-known Mallat, GenLOT, or Harr transforms. For a discussion of such 
suitable altemative wavelet transforms see Wavelets and Filter Banks , G. Strang and T. 
Nguyen, Wellesley-Cambridge Press (1997), incorporated by reference herein. 

Referring to FIG. 4(b), there is illustrated the collection of subbands 34 after 
QCIF image 30 has passed through forward wavelet transform 32. As previously 

30 indicated, the forward wavelet transform process utilizes the tensor product wavelet 
transform or other well knovm finite precision transforms as modified herein to reduce 
the effects of a finite precision implementation. Generally, the transform process will 
consist of mxn stages to produce (w+l) x («+l) subbands. In one embodiment, 
discussed below in conjunction with FIG. 6, the transform process consists of 3 x 3 

35 stages to produce a total of sixteen (16) subbands. Other embodiments can be made 
following the disclosure provided herein that would be within the scope of this 
invention. 
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Referring to FIG. 6(a), a forward wavelet transform process initially filters a 
QCIF image frame 30 on a row-by-row basis using three stages. Each stage includes a 
low pass filter 108 and a high pass filter 110. In one embodiment, each low pass filter 
108 has a value of -1, 2, 6, 2, -1 and each high pass filter has a value of -1, 2, -1. 
5 After filtering, the low pass components and high pass components are scaled 

and decimated, or dovsoisampled, at each stage by decimators 1 12 and 1 14, respectively, 
whereby components of the sample values comprising a discrete signal are eliminated. 
In the illustrated embodiment, the input image is downsampled by a factor of two (2) so 
to discard every other sample. Decimating by two (2) ultimately results in the 

10 necessary and sufficient transform coefficients to enable an exact reconstruction of the 
input. Thereafter, the dovmsampled values of the low pass components and high pass 
components are normalized at each stage in a manner that will be described in more 
detail herein below with respect to FIG. 7. The output of the first stage includes a low 
pass filter component Aqr and a high pass component Dqr. Low pass component Aqr 

15 is decomposed a second time and then a third time resulting in additional row details 
D]R and D2R5 and row average A2R, 

The row outputs Dqr, D^r, D2R, and A2R of the row stages shown in FIG. 6(a) 
are then applied on a column-by-column basis to the stages shown in FIG. 6(b). Each 
of the three stages shown in FIG. 6(b) include a filter pair, downsampling, and 

20 normalization processes that are applied in the same manner as discussed above in 
conjunction with FIG. 6(a). The transform output is a collection of subbands 34 as 
discussed above with regard to FIG. 3 and as illustrated in FIG. 4(b). 

Referring now to FIG. 4(b), for identification purposes, each subband is 
identified by a subband designation SBy, where / = 0, 1, 2, or 3 for each row and j = 0, 

25 1, 2, or 3 for each column. The shaded subband blocks, for example, subband block 

1 16 in SBqo and subband block 1 18 in SB33, correspond with IMB2^4 in QCIF image 
30 of FIG. 4(a). Due to the decimation process described above, each corresponding 
subband block is reduced proportionally such that, for example, subband block 1 16 in 
SBqo includes 8x8 coefficients and subband block 1 18 in SB33 includes 2x2 

30 coefficients. As discussed above, the related subband blocks, e.g., those subband 

blocks in each subband (SBqo through SB33) that are found in subband positions 2,4, 
are collected during the step of subband macro-block packing in block 36 (FIGS. 3 and 
5) to facilitate certain processing steps. 

Referring now to FIG. 7, in accordance with a feature of the disclosed 

35 embodiment, the remainder for each stage of the subband encoding process is 

propagated to the opposite filter path in order to compensate for errors introduced due 
to the finite precision transform. The propagated remainder is utilized to adjust the 
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coefficients on the opposite filter path to account for the loss of precision. The process 
results in a non-linear transform. Further, the process by which the filters are altered 
may make them neither bi-orthogonal nor orthogonal. 

FIG. 7 illustrates an implementation for propagating the remainders to opposite 
5 filter channels for the first stage of the row transform shown in FIG. 6(a). A similar 
implementation is included at each of the row stages and column stages. The 
coefficients of input frame 30 are filtered in low pass filter 108 and high pass filter 110 
in the normal manner. The results are, respectively, downsampled in samplers 112 and 
114. The decomposed results of low pass filter 1 08 are normalized in a low pass 

10 normalization process 120 producing a low pass normalized output 122 and a low pass 
remainder rl. The decomposed results of high pass filter 1 10 are normalized in a high 
pass normalization process 124 producing a high pass normalized output 126 and a high 
pass remainder rh. The remainders rl and rh resulting from each normalization process 
120 and 124, respectively, are each passed through functions g(rl,rh) 128 and f(rl,rh) 

15 130 as illustrated. The results of function g(rl,rh) 128 are added to low pass normalized 
output 122 in adder 132 resulting in Aqr (first stage averages). The results of function 
f(rl,rh) 130 are added to high pass normalized output 126 in adder 133 resulting in Dqr 
(first stage lost details). 

For the filters L == {-1, 2, 6, 2, -1} and H = {-1, 2, -1}, an embodiment of the 

20 functions of the remainders are: f(rl,rh) = floor(rh +1/2) with nh = V2; and g(rl,rh) = rh. 
The above described manipulation of the remainders is repeated for each filter pair, 
resulting in reduced bit allocation at the transform output. 

An embodiment of a tensor product wavelet pair is of the form: 



A=^2/- 



^2/-l +^2/+l 



25 



A +2 



where: 



30 



= input data; 

= data that precedes input data 

= data that follows input data Xjf, 

= detail term (decimated high pass filter output); 

= detail term that follows detail term ; and 
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Ai = approximation term (decimated low pass filter output). 

The above description of the tensor product wavelet transform illustrates a two- 
way split into high pass (details) and low pass (approximations) components. In 
addition, the description illustrates the possibility of propagating remainders from a 
5 first band to a second band, from the second band to the first band, or both from the 
first band to the second band and from the second band to the first. The embodiment 
described above is intended to illustrate the basic concepts of the invention and should 
in no way be interpreted to limit the scope of the invention. 

For example, a tensor product wavelet transform can have a first stage where a 

10 three-way split includes a high pass filter, a medium pass filter, and a low pass filter. 
The output of the low pass filter can then be iterated, i.e., a second stage having a three- 
way split can be applied to the output of the low pass filter, resulting in a total of five 
(5) subbands. In such an embodiment the remainders could be propagated from the low 
pass filter and the high pass filter to the medium pass filter. This embodiment is just 

15 one example of how the tensor product wavelet transform can be varied and still remain 
in keeping with the scope and spirit of the disclosed invention. Those skilled in the art 
will readily understand that there are numerous other ways in which the input can be 
split at each stage and interated, and also that there are numerous other ways in which 
the remainders can be propagated between subbands. 

20 In addition, the above description of the propagation of remainders is not 

intended to limit its use to a tensor product wavelet transform. It can be used with any 
transform. For example, the propagation of remainders can be used with a Discrete 
Cosine Transform (DCT). Also, the propagation of remainders can be used in a loss- 
less or lossy manner. 

25 As discussed herein above, the output of forward wavelet transform 32 can be a 

complete representation or an over-complete representation of QCIF image 30. A 
complete representation of QCIF image 30 includes a collection of subbands that are 
just enough to represent the contents of the image. An over-complete representation of 
QCIF image 30 includes the complete representation and redundant, altemative, or 

30 additional subband representations to facilitate motion compensation as will be 

described herein below. Each representation has value in the disclosed embodiment. 
For example, the over-complete representation can include a variety of image changes, 
such as translational movement, rotational movement, and scaling. These changes can 
be recalled as necessary during motion compensation, reducing the problem of 

35 representing image changes to one of indexing. 

It should be noted with regard to the forward wavelet transform described above 
that although the transformed image frame structures illustrated herein are for the luma 
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components, the structures also hold for the chroma components and, therefore, have 
not been separately described. 

Regarding change-detect block 46 described herein above with respect to FIG. 
3, it is noted that a zero (0) reference, or some other reference such as, for example, a 
5 past weighted grouping supplied through delay 48, may be used to detect how much 
weighted grouping 44 has changed. An embodiment of change-detect block 46 
includes a change detection metric, to which weighted grouping 44 is to be applied, of 
the general form; 

i 

10 where: = a measurement of distortion relative to reference R; 

W, = applied weights; 

G = a present grouping of subband transform coefficients; and 
7? - a reference such as, for example, zero (0) or a 

previous grouping of subband coefficients obtained 
15 through delay block 48. 

A change detection metric may take the more specific form: 

e,=W,\\G-Rll+W,\\G-Rt^ 

In addition, change-detect 46 can take advantage of information provided by a 
feed-back 132 (FIG. 3) from encoded bit stream grouping 78 to eliminate certain 
20 weighted macro-blocks in weighted grouping 44 if any are determined to be too 
expensive, in terms of bit allocation, to be output from change-detect 46. Further, 
change-detect block 46 may replace one feature, e.g., subband block, with another 
which it deems better representative of the feature. 

As described herein above and illustrated in FIG. 3, ranked subband grouping 
25 53 and ranked subband macro-block grouping 54 are fed to delay block 62 and block 

56, respectively, via line 55 for motion estimation. At block 56 a comparison process is 
carried out between the subband blocks of ranked subband macro-block grouping 54, 
i.e., the 'present* frame, and related search regions of delayed ranked subband grouping 

57, i.e., the 'reference* frame. Those skilled in the art will recognize certain advantages 
30 in utilizing reuiked subband macro-block grouping 54 for the present frame and delayed 

ranked subband grouping 57 for the reference frame. However, it should also be 
recognized that other groupings and combinations that are in keeping with the teachings 
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of this invention may be utilized. The comparison process carried out in block 56 
results in a collection of motion vectors 58 which are fed to block 60 for motion 
prediction and to block 76 for positional encoding into a bit stream, as briefly described 
herein above. 

5 Referring to FIGS. 8 and 9, motion estimation in block 56 and the generation of 

a collection of motion vectors 58 will now be more particularly described. In FIG. 8, 
delayed ranked subband grouping 57 is illustrated. Delayed ranked subband grouping 
57 is similar to the collection of subbands 34 illustrated in FIG. 4(b), but has been 
further processed by having its subband blocks ranked in block 52 (FIG. 3) and by 

10 having been delayed by at least one frame in delay block 62. To facilitate determining 
the individual motion vectors, search regions are defined about subband blocks in at 
least one of the subbands (SBqo through SB33). The subband blocks within each 
subband that are selected to have search regions defined about them are those that were 
defined as significant in change-detect block 46. It is often sufficient to develop the 

15 motion vectors based on the significant subband blocks within SBqo- 

With continued reference to FIG. 8, there are illustrated search regions that have 
been developed about. each subband block that corresponds to image macro-block 2,4 
(IMB2^4) of QCIF image 30 (FIG. 4(a)). The size of the search regions may be varied. 
However, the search regions about the subband blocks will always be proportional 

20 according to their fractional relationship with the image. For example, a basic search 
region of P x P pels in QCIF image 30 (FIG. 13) translates to a search region about 
subband block 137 in SBqo of P/2 x P/2 (FIG. 8), as indicated at 136, and a search 
region about subband block 140 in SBqi of P/4 x P/2 as indicated at 139. 

For the examples of motion estimation provided herein below, the P x P search 

25 region 107 of FIG. 13 is to include 32 x 32 pels, which is four (4) times the size of 
IMB2,4, having 16x16 pels. Therefore, the P/2 x P/2 search region 136 (FIG. 8) 
includes 16x16 coefficients which is four (4) times the size of subband block 137 (8 x 
8 coefficients). And, the P/4 x P/2 search region 139 includes 16x8 coefficients which 
is four (4) times the size of subband block 140 (8 x 4 coefficients). As will be fiarther 

30 described herein below, the subband search regions are used to facilitate determining 
motion vectors for each significant subband block (0,0 through 8,10) in some or all of 
the subbands (SBqo through SB33). 

The basic size (P x P) of the search region can be determined by empirical or 
statistical analysis taking into consideration, for example, the amount of movement 

35 anticipated between frames. In addition, consideration should be given to the 

computational effort needed to carry out a search in a given search region. It is readily 
understood by those skilled in the art that larger search regions require more 
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computational resources and, hence, more interframe delay for a fixed processor. 
Conversely, smaller search regions require less computational resources but sacrifice 
image quality. This is especially true during high image-movement periods. That is, 
the quality of the image is reduced since part of the motion may be located out of the 
5 search region, thus preventing accurate motion vector selection. 

As described above, ranked subband grouping 53 and ranked subband macro- 
block grouping 54 are fed from block 52 to delay block 62 and motion estimation block 
56, respectively, over line 55. For the example herein below, a search region is placed 
about subband block 2,4 of SBqo delayed ranked subband grouping 57 (FIG. 8). 

10 And, a subband block of SBqo in subband macro-block 2,4 in ranked subband macro- 
block grouping 54 (ref. subband block 1 16 in FIG. 5(c)) is used to traverse the search 
region for change. However, as noted above, any selection of the subbands or all of the 
subbands may be used following the below described method. 

Referring now to FIGS. 3, 8, and 9, as described above, ranked subband 

15 grouping 53 is delayed in delay 62 producing delayed ranked subband grouping 57 (the 
'reference' frame). Delayed ranked subband grouping 57 is fed to motion estimation 
block 56 wherein a search region 136 is identified to as having a P/2 x P/2 region in 
SBqo about subband block 137. For this example, the search region is equal to 16 x 16 
coefficients. Ranked subband macro-block grouping 54 (the 'present' frame) is also fed 

20 to motion estimation block 56 wherein a subband block 138 (FIG. 9(a)), similar to the 
shaded area of subband block 1 16 in FIG. 5(c), is retrieved for use in the comparison 
process described below. 

Referring now in particular to FIGS. 9(a) through (d) there is illustrated the 
process by which motion vectors (MV^^x) determined in motion estimation block 

25 56 of FIG. 3. In the below example, a motion vector is determined for one subband 
block, i.e., subband block 2,4, of SBqo- However, motion vectors can be determined 
for each significant subband block in each subband (SBoo-through SB33). 

Referring to FIG. 9(a), subband block 138 of ranked subband macro-block 
grouping 54 is located within the search region 136 of delayed ranked subband 

30 grouping 57 (FIG. 8). Subband block 138 is essentially superimposed on subband 
block 137 of delayed ranked subband grouping 57. As discussed above, ranked 
subband macro-block grouping 54 has a structure similar to subband macro-block 
grouping 40 illustrated in FIG. 5(c). And, delayed ranked subband grouping 57 has a 
structure similar to the collection of subbands 34 illustrated in FIG. 4(b). Referring 

35 again to FIG. 9(a), coefficients 141 of search region 136 (illustrated as four (4) circles 
with an 'x' in each) and coefficients 142 of subband block 138 (illustrated as four (4) 
circles) are used herein to facilitate illustrating the method of determining motion 
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vectors. It is assumed for this example that coefficients 141 and 142 are approximately 
equal in value and that the remaining coefficients (not shown) are of a different value 
from coefficients 141 and 142, but approximately equal to each other. The difference 
in the positions of coefficients 141 and 142 represents a change between two video 
5 frames, e.g., translational movement. 

Referring to FIG. 9(b), subband block 138 traverses, i.e., searches in a 
predetermined stepwise pattem, search region 136 seeking to determine the total 
absolute difference at each step between subband block 138 and search region 136. 
Those skilled in the art will recognize that various traversing patterns can be used. In 

10 addition, criterion other than total absolute difference can be used as a basis for the 

comparison. The initial comparison seeks to find the best match utilizing incremental, 
or whole step, movements of subband block 138. An incremental movement is a full 
shift, or step, in either the jc orj^ direction. For example, in searching the entire search 
region 136, subband block 138 shifts within search region 136 by ± 4 increments, i.e., 

15 transform coefficients, in the x direction and ± 4 increments in the y direction. Subband 
block 138 shifts ±4 increments in the x andy direction because subband block 138 has 8 
X 8 coefficients while search region 136 has 16x16 coefficients. 

With continued reference to FIG. 9(b), after conducting an incremental search, 
the best match is found to be three (3) full incremental movements in the positive x 

20 direction and two (2) full incremental movements in the positive y direction. 

Thereafter, as viewed in FIG. 9(c), fractional differences are determined to more 
accurately represent the difference between subband block 138 and search region 136. 
To facilitate this process, masks representing fractional movement appropriate for the 
particular subband are applied to subband block 138. 

25 For example, because SBqq is one-quarter (1/4) the size of the related macro- 

block in the original image (see IMB2^4 of FIG. 4(a)), there are four fractional 
movements that subband block 138 can make to more accurately reproduce the finer 
movements of IMB2^4. That is, subband block 138 can move ±1/2 of an increment in 
the jc direction and ±1/2 of an increment in the 3/ direction. Therefore, four fi*actional 

30 masks 143 are used to alter subband block 138 in search of the best match. 

With continued reference to FIG. 9(c), the four masks 143 are applied to 
subband block 138. Between the application of each mask the total absolute difference 
between the coefficients in subband block 138 and search region 136 is determined. If 
a better match is found, in comparison to that determined during incremental searching 

35 described above, the fi-actional mask is added to the motion vector. In the example, the 
best match is determined to be +1/2 fractional movement in the positive x direction. 
The resulting x andy components of the motion vector are +3 1/2 and +2, respectively. 
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Those skilled in the art will recognize that it is unusual to obtain as exact a match as 
that illustrated in the above example. In this sense, the 'best match' between the 
coefficients of the subband block and the coefficients of the search region may more 
accurately be described as the 'closest approximation' between the two. Motion 
5 prediction is used later to compensate for this lack of accuracy. 

Referring to FIG. 9(d), the x and j/ components of the motion vector have their 
signs inversed and are scaled. More particularly, multiplying each of the x and 
components by negative one (-1) and, in this example where SBqo is used for motion 
estimation, multiplying each of the x and >' components by two (2). The signs of the x 

10 and y components are inversed so that when the motion vectors are applied to delayed 
ranked subband grouping 57 during motion prediction (discussed in more detail below), 
the appropriate coefficients are moved from the 'previous' frame position to the 'present' 
frame position. And, the x andy components are scaled up so to represent the 
movement determined above (x = 3 \/2,y = 2) in terms of the related macro-block in 

15 the original QCIF image (IMB2^4). Scaling allows a more simple determination of the 
X and y components used in shifting the appropriate coefficients in subbands SBqo 
through SB33 during motion prediction. 

In the example, the resulting motion vector identifying the movement of the 
subband blocks in SMB2^4 is x = -7 and>^ = -4 (MV2^4). MV2^4 is stored in memory 

20 with the collection of motion vectors 58. MV2^4 therefore represents the movement of 
certain collections of coefficients from each subband in delayed ranked subband 
grouping 57 (the 'reference' frame) to their new positions so to predict ranked subband 
grouping 53 (the 'present' frame). The above process is repeated for each significant 
subband block in, for example, SBqo* Processing typically proceeds in the order of 

25 ranking, that is, from the macro-blocks having the greatest amount of movement to 

those having the least amount of movement. Entirely insignificant subband blocks will 
not be considered at all and therefore will have no motion vector assigned. This will 
occur, for example, when there is insignificant or no change at those locations between 
frames. It can also occur when subbands blocks are zero (0) referenced as described 

30 herein above. 

If a different subband is to be used to calculate the motion vectors, incremental 
and fractional movements would be determined in a manner analogous to that described 
above using the proportional relationship of the particular subband with respect to the 
QCIF image 30. For example, if subband blocks in SBqi are used to develop the 
35 motion vectors, the following criterion would apply: search region size =16x8 

coefficients; x fractional masks = ±1/4, ±1/2, and ±3/4 increments; y fractional masks = 
±1/2 increments; x scaling = 4; andj^ scaling = 2. 
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An advantage of using the above method is that separable fihers can be 
employed. In other words, filters used for incremental and fractional movement of one 
subband block can be used for incremental and fractional movement of another subband 
block. For example, subband blocks in SBqo have four (4) possible fractional 
5 movements of x = ±1/2 and^ = ±1/2. And, subband blocks in SBqi have eight (8) 
possible fractional movements of x = ±1/4, ±1/2, and ±3/4, andj^ = ±1/2. Because of 
the common fractional movements of x = ±1/2 andy = ±1/2 in SBqo and SBqi, single 
separable filters can be used for fractional movements of x = +1/2, x = -1/2, = +1/2, 
andjv = -1/2 in both subbands. This method can be used for all common fractional 

10 movements in delayed ranked subband grouping 57. The same advantageous use of 
separable filters can be carried out in motion prediction block 60. 

Referring to FIG. 10, after all significant subband blocks have been processed in 
motion estimation block 56, the collection of motion vectors 58 are output to motion 
prediction block 60 and positional encoding block 76. In motion prediction block 60 

15 the motion vectors are used to calculate the shift of certain collections of coefficients 
from each subband in delayed ranked subband grouping 57 (the 'reference* frame) to 
their new positions so to predict ranked subband grouping 53 (the 'present' frame). 

To determine which masks to use to produce such shifts, the x and y 
components are multiplied by the reciprocal of the corresponding modulo of each 

20 subband block. For example, to determine the x and y components for shifting the 8x8 
collection of coefficients 148 that have been determined to have moved to the 2,4 
position in SBqo. the x and j^' components of MV2^4 are each multiplied by the 
reciprocal of the corresponding modulo two (2). This calculation results in x = -3 1/2 
andy = -2. Therefore, a mask for incremental movement of x = -3, a mask for 

25 fractional movement of x = -1/2, and a mask for incremental movement of = -2 are 
applied to the 8x8 coefficients 148. 

As a second example, to determine the x and y components for shifting the 8x4 
collection of coefficients 149 that have been determined to have moved to the 2,4 
position in SBqi, the x component of MV2^4 is multiplied by the reciprocal of modulo 

30 four (4) and the component of MV2^4 is multiplied by the reciprocal of modulo two 
(2). This calculation results in x = -1 3/4 andj; = -2. Therefore, a mask for incremental 
movement of x = -1 , a mask for fractional movement of x = -3/4, and a mask for 
incremental movement of = -2 are applied. 

FIG. 10 illustrates the movement of all the collections of coefficients to the 

35 subband blocks corresponding to SMB2^4. The application of all motion vectors 

(MVx^x) from the collection of motion vectors 58 to delayed ranked subband grouping 
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57 (the 'reference' frame) results in a prediction of ranked subband grouping 53 (the 
'present* frame) and is called the predicted grouping 66 (FIG. 3). 

An alternate embodiment of the above described masking process for 
determining fractional movement between frames includes the use of 3 x 3 coefficient 
5 masks. These masks take a weighted average of the coefficients surrounding a selected 
coefficient. In the alternate approach, a collection of motion vectors 58 that include 
only incremental movements is determined as described above and illustrated in FIG. 
9(a) and 9(b) for each significant subband block in each subband (SBqo through SB33) 
or a select number subbands, e.g., SBqq only. The collection of motion vectors 58 are 

10 fed to motion prediction block 60. 

In motion prediction block 60, the collection of motion vectors 58 is applied in 
a manner analogous to that illustrated in FIG. 10 causing significant subband blocks of 
the delayed ranked subband grouping 57 to be incrementally shifted. Thereafter, each 
coefficient of each shifted collection of coefficients has a 3 x 3 mask applied to it. The 

15 masks that are applied determine the weighted average of the coefficients surrounding 
each shifted coefficient. The resultant of that calculation is the prediction of the shifted 
coefficient, i.e., the coefficients new value. 

After all of the motion vectors from the collection of motion vectors 58 have 
been applied to delayed ranked subband grouping 57 and all of the coefficients that 

20 were shifted by the motion vectors have had the 3x3 mask applied to them, the result 
is output from motion prediction block 60 as predicted grouping 66. Of course, the 
process is repeated in prediction block 98 of decoder 82 to replicate the masking 
process carried out in motion prediction block 60. 

After the prediction is determined by either of the methods described above, 

25 predicted grouping 66 is passed to difference block 68 wherein the difference between 
ranked subband grouping 53 and predicted grouping 66 is determined. As described 
above, difference block 68 produces grouping differences 70. 

Although the motion compensation methods described herein are illustrated as 
functioning in conjunction with a tensor product wavelet, it is important to note that the 

30 methods can be utilized with other types of transforms. This includes utilizing the 
motion compensation methods with other transforms in either the time domain or the 
transform domain. For example, data transformed in a DCT can be motion 
compensated in a maimer similar to that described above. That is, the 64 transform 
coefficients in each of the 8x8 blocks of the DCT can be motion compensated in a 

35 maimer similar to that used to motion compensate the 64 transform coefficients in each 
of the 8x8 subband blocks in SBqo of the tensor product wavelet transform. 
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Referring now to FIG. 1 1 there is illustrated another embodiment of the video 
encoder. As in the embodiment described above and illustrated in FIG. 3, motion 
estimation and motion prediction are carried out in the transform domain in blocks 150 
and 152, respectively. Also, the front portion of the embodiment is similar to that 
5 discussed above and illustrated in FIG. 3. More specifically, GIF image 26 is converted 
to a QCIF image 30 in converter 28. QCIF image 30 is transformed and converted to a 
subband macro-block grouping 40 by the image to feature mapping components 20. 
And, the collection of subbands 34 and the subband macro-block grouping 40 are 
converted to ranked subband grouping 53 and ranked subband macro-block grouping 
10 54, respectively, by the components associated with determining the features which 
changed 22, 

Also similar to the embodiment illustrated in FIG. 3 is that ranked subband 
macro-block grouping 54 is fed to a motion estimation block 150 and ranked subband 
grouping 53 is fed to difference block 68. However, instead of utilizing a delayed 

15 ranked subband grouping 57 as a reference frame, an error corrected subband grouping 
171, having accumulated errors added thereto, is fed to delay block 156, thereby 
producing delayed subband grouping 172 (the 'reference frame'). Such a variation is 
necessary when quantization (or scaling) is so great that it substantially alters the 
prediction errors 70 produced in difference block 68. 

20 To develop error corrected subband grouping 171, a copy of ranked subband 

grouping 53 is passed unchanged through difference block 68 and stored in memory 
when the system is referenced to zero (0), for example, when the system is initiated or 
when the reference in the decoder is to be refreshed. Thereafter, prediction errors 70 
are accumulated, i.e., added to the reference, as the prediction errors 70 of each 

25 subsequent frame passes quantize block 158. The updated reference image is fed to 
delay block 156, thereby producing delayed subband grouping 172. By utilizing this 
method the reference in the encoder remains synchronized with the reference in the 
decoder. Those skilled in the art will recognize that such an arrangement is usefiil in 
maintaining synchronization between the encoder and decoder when significant 

30 amounts of scaling and/or quantization is carried out between motion prediction and 
positional encoding. 

After motion estimation block 150 and motion prediction block 152 receive the 
delayed subband grouping 172 from delay block 156, motion estimation and motion 
prediction are determined by a procedure similar to that described herein above and 

35 illustrated in FIGS. 8 through 10. In addition, a forward feed 159 is provided between 
change detect 46 and quantize block 158 for adjusting the amount of quantization that 
is to be performed on a particular block, depending on the amount the block has 
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changed. When a large amount of change is detected in change detect 46, a large 
number of bits are allocated for quantization. And conversely, when a small amount of 
change is detected in change detect 46, a proportionately lesser number of bits are 
allocated for quantization. 
5 Referring now to FIG. 1 2 there is illustrated yet another embodiment of the 

video encoder. The front portion of this embodiment is similar to the embodiments 
discussed above and illustrated in FIGS. 3 and 1 1 . However, unlike the embodiments 
described above, motion estimation is carried out in the image domain. This 
embodiment takes advantage of special hardware configurations presently available on 

10 some processors. 

In Fig. 12, a GIF image 26 is converted to a QCIF image 30 in converter block 
28. The QCIF image 30 is transformed and converted to a subband macro-block 
grouping 40 by the image to feature mapping components 20. Subband macro-block 
grouping 40 are processed by the components associated with determining the features 

1 5 which have changed 22 to determine subband macro-block ranking. The results are 
applied to the collection of subbands 34 resulting in ranked subband grouping 53. 
Ranked subband grouping 53 is thereafter fed to difference block 68. 

The QCIF image 30, also referred to as the 'present' frame, is also fed to motion 
estimation block 160 and delay block 166 for determining a collection of motion 

20 vectors 162. More specifically, an image frame 30 is delayed in delay 166 producing a 
delayed image frame 167, also referred to as the 'reference' frame. With reference to 
FIG. 13, delayed image frame 167 is fed to motion estimation block 160 wherein a P x 
P pels search region is developed about each significant image macro-block. For 
example, a P x P pels search region 107 is established about image macro-block 2,4 

25 (IMB2^4). Based on empirical analysis, a search region 107 of 32 x 32 pels is used as 
the search region about a 16 x 16 pels image macro-block of a QCIF image frame. 

In motion estimation block 160, each significant image macro-block (IMBx^x) 
of the present QCIF image 30 frame is located within the corresponding search region 
in the delayed image frame 167 for determining the motion vectors. For example, 

30 IMB2^4 is retrieved from QCIF image 30 and located within search region 107 of 

delayed image frame 167. This process is analogous to that carried out in the transform 
domain as described above and illustrated in FIGS. 8 and 9(a). 

In a maimer analogous to that described above and illustrated in FIG. 9(b), 
IMB2^4 traverses search region 107 seeking to determine the minimum total absolute 

35 difference at each step between IMB2^4 and search region 107. Unlike subband 
searching described above, however, fractional searching is unnecessary when 
searching in the image domain. Therefore, after determining the incremental 
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movement of IMB2^4, the x and coordinates are inversed (multiplied by -1) and 
stored in memory with the collection of motion vectors 162. The motion vectors are 
fed to motion prediction block 154 and positional encoding block 76. Thereafter, the 
motion vectors are applied to delayed subband grouping 1 72 in a manner similar to that 
5 described above with regard to FIGS. 3 and 1 1 and illustrated in FIG. 10. 

Referring now to FIG. 14, another embodiment of the video encoder is 
illustrated wherein the front portion is similar to the embodiments discussed above and 
illustrated in FIGS. 3,11 and 12. However, unlike the embodiments described above, 
both motion estimation and motion prediction are carried out in the image domain. 

10 In FIG. 14, a collection of motion vectors 162 are determined in a manner 

similar to that described above and illustrated in FIGS. 12 and 13. The collection of 
motion vectors 162 are fed to block 164 for motion prediction and to block 76 for 
positional encoding. In a marmer similar to that described above and illustrated in 
FIGS. 1 1 and 12, an error corrected subband grouping 171, having accumulated errors 

15 added thereto, is fed to delay block 156, thereby producing delayed subband grouping 
172 (the 'reference frame'). Unlike the above described embodiments, however, the 
delayed subband grouping 172 is then reconstructed by inverse wavelet transform block 
174 to form a reconstructed image 176. The reconstructed image has a structure that is 
similar to QCIF image 30 illustrated in FIG. 4(a). 

20 Altematively, instead of reconstructing the delayed subband grouping 1 72 in its 

entirety, a portion of the grouping can be reconstructed to effectuate efficiencies. For 
example, a 3, 5 filter can be used to obtain a reconstructed region having 48 x 48 pels. 
Regions are selected based on the significance of, i.e., the detected changes within, the 
image macro-blocks (16 x 16) about which the regions are centered. 

25 In motion prediction block 164, the collection of motion vectors 162 are applied 

to the reconstructed image 176 (or the reconstructed 48 x 48 pels regions if only regions 
were inverse wavelet transformed). The collection of motion vectors 162 are applied to 
the reconstructed reference image 1 76 in a manner analogous to that described above 
and illustrated in FIG. 10 for the shifting collections of transform coefficients in the 

30 subband representation of the QCIF image. Thereafter, a prediction 178 is fed to 

forward wavelet transform block 180 producing predicted grouping 66. The predicted 
grouping 66 is then subtracted from ranked subband grouping 53 in difference block 68 
resulting in grouping differences 70. Quantization is carried out in block 158 and the 
errors are accumulated to maintain the reference (as described above) and are also 

35 forwarded to positional encoding block 76. Positional encoding of the quantized errors 
and motion vectors 1 62 takes place as described above and are forwarded to the 
decoder via transmission line 80. 
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Although illustrated herein as a software implementation, the principles of the 
embodiments of the invention could also be implemented in hardware, for example, by 
means of an application-specific integrated circuit (ASIC). Preferably, the ASIC 
implementation, including the necessary memory requirements, should operate at the 
5 pel rate in order to (i) minimize power consumption consistent with the embodiment, 
and (ii) permit compression of full color video, such as for example a fiill CCIR601, at 
a data rate of less than 13.5 MHz. It is foreseen that power consumption will be 
reduced by a factor of ten (10) times by utilizing an ASIC in comparison to the 
conventional software and processor implementation. 

10 Altematively, optical methods can be employed to produce even further power 

savings. As described above, an approximation to the image is created at each stage of 
the wavelet transform and the details lost by making this approximation are recorded. 
In a photo-electronic or an optical implementation, how the light is gathered and related 
charge is sensed can be adjusted to gather samples of each of the approximation 

15 images. If these approximation images are co-registered in parallel, the detail terms can 
be calculated from these intermediate values by either analog or digital means. 
Preferably, analog means are used to calculate the detail terms as the output of an 
analog stage. 

The detail terms can be quantized through the use of a bit serial analog-to- 
20 digital converter which implements the quantization strategy. The resulting bit stream 
is compressed. In this manner, the photonic/optical device operates, i.e., the number of 
digital transitions which occur, at the compressed data rate rather than at the image data 
rate (as in the case of an ASIC) or the processor data rate (as in the case of a 
conventional processor). This will result in an implementation which consumes very 
25 little current, thus requiring less power. It is foreseen that the implementation of an 

optical method will further reduce power consumption by a factor of approximately ten 
(10) times that of an ASIC implementation. 

It is to be understood that the embodiments and variations shown and described 
herein are merely illustrative of the principles of this invention and that various 
30 modifications may be implemented by those skilled in the art without departing from 
the scope and spirit of the invention. 



